Goto

Collaborating Authors

 Mosul


Language Model Tokenizers Introduce Unfairness Between Languages

Neural Information Processing Systems

Recent language models have shown impressive multilingual performance, even when not explicitly trained for it. Despite this, there are concerns about the quality of their outputs across different languages. In this paper, we show how disparity in the treatment of different languages arises at the tokenization stage, well before a model is even invoked. The same text translated into different languages can have drastically different tok-enization lengths, with differences up to 15 times in some cases. These disparities persist even for tokenizers that are intentionally trained for multilingual support.


Mystery of Egypt's Giza pyramids deepens as hidden megastructure 4,000 feet below is revealed

Daily Mail - Science & tech

Republican Governor rips Trump for'MURDER' in Minneapolis as GOP erupts at ICE scandal Seven dead in private jet crash as audio reveals voice said'Let there be light' seconds before tragedy at snowy Maine airport Is Angelina Jolie quitting America? Private struggles emerge... as actress weighs major lifestyle that threatens to rupture her family Inside the secret double life of a beloved neurosurgeon whose gay love triangle ended... in an execution at his $2.5M mansion Queer Eye snitch reveals exactly what was said about Karamo Brown in a hot mic moment... that's torn the cast apart Kate Hudson's Oscar nomination torched as an'abomination' amid toxic family feud over Song Sung Blue Mystery of Egypt's Giza pyramids deepens as hidden megastructure 4,000 feet below is revealed America's best and worst states to retire revealed - and why Florida is no longer the obvious winner Prince Harry and Meghan Markle's Sundance screening sparks online row: 'Sussex Squad' brand claims event failed to sell out as'lies' despite photos showing'rows of empty seats' Kristi Noem's VERY unfortunate post shortly before Trump sent Tom Homan to Minneapolis to clean up mess after she lied about protester shot dead by her DHS officers NFL's'scripted' conspiracy theory resurfaces as fans find five-month old post hinting at Super Bowl 60 matchup Forensic video analysis of Alex Pretti's final 30 seconds exposes'John Wayne gun' question that can't be ignored Victoria and David Beckham make first public appearance together since son Brooklyn's damning statement as children Cruz, Romeo and Harper turn up to support her as she becomes a Knight of the Order of Arts and Letters Kristi Noem is dealt hammer blow live on Fox News as Trump lawyer trashes claim Minneapolis victim Alex Pretti was'domestic terrorist' Lauren Sanchez turns heads in a red skirt suit as she holds hands with billionaire husband Jeff Bezos at Schiaparelli's Paris Haute Couture Fashion Week show Mystery of Egypt's Giza pyramids deepens as hidden megastructure 4,000 feet below is revealed READ MORE: I interviewed Joe Rogan's'worst guest ever'... then a controversial question stopped everything Joe Rogan's latest podcast guest delved into controversial scans showing an enormous underground structure beneath the Great Pyramid of Giza, potentially rewriting ancient history. The scans were conducted by Italian scientist Filippo Biondi and the Khafre Project team using synthetic aperture radar. More than 200 scans from multiple satellites, including Italy's Cosmo-SkyMed and the US-based Capella Space, showed uniform results suggesting massive pillars about 65 feet in diameter wrapped in spirals and plunging nearly 4,000 feet deep. Those pillars appear to end in 260-foot cubic chambers beneath all three pyramids and the Sphinx, which Biondi described as'huge chambers' measuring roughly 260 feet in length and width.



How terrorist groups are leveraging AI to recruit and finance their operations

The Guardian

Counter-terrorism authorities have, for years, characterized keeping up with terrorist organizations and their use of digital tools and social media apps as a game of Whac-a-Mole. Jihadist terrorist groups such as Islamic State and its predecessor al-Qaida, or even the neo-Nazi group the Base, have leveraged digital tools to recruit, covertly finance via crypto, download weapons for 3D printing and spread tradecraft to its followers, all while leaving law enforcement and intelligence agencies playing catch up. Over time, thwarting attacks and maintaining the technological advantage over these types of terror groups has evolved, as more and more open source resources become available. Now, with artificial intelligence – both on the horizon as a rapidly developing technology and in the here and now as free, accessible apps – agencies are scrambling. Sources familiar with the US government's counterterrorism efforts told the Guardian that multiple security agencies are very concerned about how AI is making hostile groups more efficient in their planning and operations.


Linear Correlation in LM's Compositional Generalization and Hallucination

arXiv.org Artificial Intelligence

The generalization of language models (LMs) is undergoing active debates, contrasting their potential for general intelligence with their struggles with basic knowledge composition (e.g., reverse/transition curse). This paper uncovers the phenomenon of linear correlations in LMs during knowledge composition. For explanation, there exists a linear transformation between certain related knowledge that maps the next token prediction logits from one prompt to another, e.g., "X lives in the city of" $\rightarrow$ "X lives in the country of" for every given X. This mirrors the linearity in human knowledge composition, such as Paris $\rightarrow$ France. Our findings indicate that the linear transformation is resilient to large-scale fine-tuning, generalizing updated knowledge when aligned with real-world relationships, but causing hallucinations when it deviates. Empirical results suggest that linear correlation can serve as a potential identifier of LM's generalization. Finally, we show such linear correlations can be learned with a single feedforward network and pre-trained vocabulary representations, indicating LM generalization heavily relies on the latter.


NASA astronaut spots 'two metallic spherical orbs' flying by his airplane over Texas

Daily Mail - Science & tech

A former NASA astronaut has come forward to reveal that he personally witnessed'two metallic spherical orbs' whizz by his plane this August while flying above Texas. Leroy Chiao, who served as the commander of Expedition 10 to the International Space Station (ISS) in 2004 and 2005, was 9,000 feet in the air when objects'zipped' on the left side of his airplane. He said one flew on top of the other and each was about three feet in diameter. 'It's just kinda dumb luck that they didn't hit me,' said Chiao. The former NASA astronaut estimates that the orbs were only'about 20 feet away.' 'It could've been a bad result, if they had actually hit me,' Chiao said.


The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation

arXiv.org Artificial Intelligence

This report surveys the landscape of potential security threats from malicious uses of AI, and proposes ways to better forecast, prevent, and mitigate these threats. After analyzing the ways in which AI may influence the threat landscape in the digital, physical, and political domains, we make four high-level recommendations for AI researchers and other stakeholders. We also suggest several promising areas for further research that could expand the portfolio of defenses, or make attacks less effective or harder to execute. Finally, we discuss, but do not conclusively resolve, the long-term equilibrium of attackers and defenders.


The Translation of Circumlocution in Arabic Short Stories into English

arXiv.org Artificial Intelligence

This study aims at identifying and analyzing circumlocution categories and subcategories in the (SL) and their renditions into the (TL).It is based on criteria proposed for inclusion and exclusion of circumlocution.This study is concerned with the translation of literary texts, specifically short stories, from Arabic into English. It draws on four short stories selected from Arabic famous writers and their parallel translations into English. It hypothesizes that Arabic categories of circumlocution are applicable to English categories of metadiscourse, which include textual and interpersonal items. Nida's (1964) model is adopted in this study to judge the appropriateness in translation the study shows that the translators made serious decisions while opting for various techniques such as addition, subtraction and alteration. In this sense, it investigates whether the translators have successfully and appropriately managed to render the concept of Arabic circumlocution into English or not. The main problems that led to the inappropriate translations were also identified. This study concludes that there are lots of similarities between the categories of circumlocution in Arabic and the categories of metadiscourse in English. These similarities are clear when appropriate renditions are achieved.


A Survey of Large Language Models for Arabic Language and its Dialects

arXiv.org Artificial Intelligence

This survey offers a comprehensive overview of Large Language Models (LLMs) designed for Arabic language and its dialects. It covers key architectures, including encoder-only, decoder-only, and encoder-decoder models, along with the datasets used for pre-training, spanning Classical Arabic, Modern Standard Arabic, and Dialectal Arabic. The study also explores monolingual, bilingual, and multilingual LLMs, analyzing their architectures and performance across downstream tasks, such as sentiment analysis, named entity recognition, and question answering. Furthermore, it assesses the openness of Arabic LLMs based on factors, such as source code availability, training data, model weights, and documentation. The survey highlights the need for more diverse dialectal datasets and attributes the importance of openness for research reproducibility and transparency. It concludes by identifying key challenges and opportunities for future research and stressing the need for more inclusive and representative models.


Building Damage Assessment in Conflict Zones: A Deep Learning Approach Using Geospatial Sub-Meter Resolution Data

arXiv.org Artificial Intelligence

Very High Resolution (VHR) geospatial image analysis is crucial for humanitarian assistance in both natural and anthropogenic crises, as it allows to rapidly identify the most critical areas that need support. Nonetheless, manually inspecting large areas is time-consuming and requires domain expertise. Thanks to their accuracy, generalization capabilities, and highly parallelizable workload, Deep Neural Networks (DNNs) provide an excellent way to automate this task. Nevertheless, there is a scarcity of VHR data pertaining to conflict situations, and consequently, of studies on the effectiveness of DNNs in those scenarios. Motivated by this, our work extensively studies the applicability of a collection of state-of-the-art Convolutional Neural Networks (CNNs) originally developed for natural disasters damage assessment in a war scenario. To this end, we build an annotated dataset with pre- and post-conflict images of the Ukrainian city of Mariupol. We then explore the transferability of the CNN models in both zero-shot and learning scenarios, demonstrating their potential and limitations. To the best of our knowledge, this is the first study to use sub-meter resolution imagery to assess building damage in combat zones.